Accept binary file-like objects in to_geotiff and the readers#1512
Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom May 8, 2026
Merged
Accept binary file-like objects in to_geotiff and the readers#1512brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol merged 2 commits intoxarray-contrib:mainfrom
Conversation
Closes xarray-contrib#1511. - New _BytesIOSource wraps any read+seek file-like; a lock around seek+read keeps thread-pool windowed reads race-free. - _open_source, read_to_array, _read_geo_info, and open_geotiff accept either a string or a file-like. - to_geotiff accepts any path with a write method; _write_bytes writes straight to the buffer for file-likes and keeps the temp-file + os.replace atomic write for string paths. - Reject cog=True for file-likes (deferred), gpu=True / chunks for file-like sources, and gate VRT branches on isinstance str so buffers can't accidentally hit the VRT code path. - Dask + file-like falls back to eager in-memory assembly since write_streaming patches IFD offsets in place on a temp path. Tests: xrspatial/geotiff/tests/test_bytesio_source.py covers round-trip, windowed read, COG/VRT rejection, and concurrent reads from one source.
There was a problem hiding this comment.
Pull request overview
This PR extends the GeoTIFF reader/writer APIs to accept in-memory/binary file-like objects (e.g., io.BytesIO) in addition to string paths, enabling fully in-memory read/write workflows and adding tests for round-trip and concurrent windowed reads.
Changes:
- Added a file-like-backed reader source (
_BytesIOSource) and updated reader entry points to acceptstror binary file-like objects. - Updated
open_geotiff/to_geotiffto validate unsupported combinations for file-like sources/destinations (e.g., dask/gpu reads,cog=Truewrites). - Added a dedicated test module covering BytesIO round-trips, windowed reads, and concurrent access.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_reader.py |
Adds file-like detection and _BytesIOSource to support ranged reads from seekable buffers. |
xrspatial/geotiff/_writer.py |
Extends _write_bytes to write to file-like destinations and hardens _is_fsspec_uri for non-strings. |
xrspatial/geotiff/__init__.py |
Broadens public API to accept file-like inputs and adds early validation for unsupported modes. |
xrspatial/geotiff/tests/test_bytesio_source.py |
Adds new tests for buffer-based read/write and concurrency behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1510
to
+1515
| or any binary file-like object exposing ``write``.""" | ||
| import os | ||
|
|
||
| # File-like destination: append the encoded bytes. The caller owns | ||
| # the buffer's lifetime (we don't close it). | ||
| if not isinstance(path, str) and hasattr(path, 'write'): |
Comment on lines
+383
to
+384
| self._size = fileobj.tell() | ||
| try: |
Comment on lines
+450
to
+452
| raise TypeError( | ||
| f"source must be a str path/URL or a binary file-like object " | ||
| f"with read+seek methods, got {type(source).__name__}") |
Comment on lines
+346
to
350
| # VRT files (string paths only -- VRT XML references other files on disk) | ||
| if isinstance(source, str) and source.lower().endswith('.vrt'): | ||
| return read_vrt(source, dtype=dtype, window=window, band=band, | ||
| name=name, chunks=chunks, gpu=gpu, | ||
| max_pixels=max_pixels) |
Comment on lines
+404
to
+406
| if isinstance(source, str): | ||
| import os | ||
| name = os.path.splitext(os.path.basename(source))[0] |
Comment on lines
+727
to
+730
| elif not isinstance(path, str): | ||
| raise TypeError( | ||
| f"path must be a str or a binary file-like with a write() " | ||
| f"method, got {type(path).__name__}") |
…runcate-on-rewrite, tell()
- `_coerce_path` normalises `os.PathLike` (e.g. `pathlib.Path`) to `str` at
the top of every public reader/writer entry. Path('mosaic.vrt') now routes
to read_vrt, Path('x.tif') derives a name, etc.
- `to_geotiff` rejects `gpu=True` with a file-like destination up front.
The write_geotiff_gpu path was never tested with buffers and would have
hit `_write_bytes(path)` without truncating.
- `_write_bytes` rewinds and truncates the buffer before writing when the
destination supports it. Two writes to the same BytesIO now overwrite
rather than concatenate, matching string-path semantics.
- `_is_file_like` now requires `tell` in addition to `read`/`seek`.
`_BytesIOSource` calls `tell()` to size the buffer; the previous gate
let read-seekable-but-not-tellable inputs through and crash inside
`__init__`. We drop the guarded-tell pattern in the constructor in favour
of a single try/except that raises a clear ValueError if the buffer is
unusable (e.g. closed).
- Drop unused `threading` import from the test file.
- Tests cover Path round-trip, Path('.vrt') VRT routing, Path-derived name,
GPU+buffer rejection, BytesIO overwrite-on-rewrite, and the tell()
requirement at the gate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1511.
What changed
_BytesIOSourcein_reader.pywraps any binary file-like (read+seek). A lock around seek+read keeps thread-pool windowed reads from racing on the buffer's cursor._open_sourceandread_to_arraytake either a string or a file-like._read_geo_inforeads bytes directly from a buffer. We don'tmmap.mmaparbitrary file-likes since they may not back a real fd.open_geotifftakes a buffer. RaisesValueErrorforgpu=Trueorchunks=...with a buffer, since those paths re-open the source by path from worker tasks or device-side readers.to_geotifftakes anypathwith awritemethod. RaisesValueErrorforcog=True+ file-like (see Deferred). The.vrtbranch is gated onisinstance(path, str)so a buffer can't accidentally land in the VRT path._write_byteswrites straight to the buffer when given a file-like. String paths keep the existing temp-file +os.replaceatomic write.to_geotifffalls back fromwrite_streamingto eager in-memory assembly for buffer destinations. The streaming writer patches IFD offsets in place and needs a real filesystem path._is_fsspec_uriin both modules now type-checks before string ops.Deferred
cog=Trueto a file-like would need overview passes plus IFD patching against the buffer. Out of scope here.Tests
xrspatial/geotiff/tests/test_bytesio_source.pyadds 7 cases: round-trip, uint8 round-trip, windowed read,cog=Truereject, VRT-extension non-trigger, and concurrent reads from one source via a thread pool.Full geotiff suite: 674 pass. The 3 remaining failures are pre-existing matplotlib palette tests on origin/main, unrelated to this PR.